Skip to main content

All Questions

1vote
1answer
356views

SKLearn PCA explained_variance_ration cumsum gives array of 1

I have a problem with PCA. I read that PCA needs clean numeric values. I started my analysis with a dataset called trainDf with shape ...
Kalizi's user avatar
1vote
0answers
69views

Trying to understand a PCA output

I recently ran a code to generate PCA for a movie ratings dataset. Actually there were two different datasets, a 'movies' and a 'ratings' one. The movie had about 9700 rows of different movie titles ...
ShridharK's user avatar
2votes
1answer
704views

Perform PCA on columns of different length

I have about 20-30 columns, all with different lengths. The first column has 25000 rows, the second column has 19000 rows, and it's different for all the columns. All are the survey data with 0 (No),1(...
Bishal Th.'s user avatar
1vote
0answers
627views

Training PCA on BERT word embedding: entire training dataset or each document?

I want to reduce the dimensionality of the BERT word embedding to, let's say, 50 dimensions. I am trying with PCA. I will use that for the document classification task. Now for training PCA, should ...
user3363813's user avatar
1vote
1answer
390views

Not Access to Confusion Matrix in SVM.SVC.score Scikit-learn Python

I used SVM.SVC function to classify. But when I wanted to calculate the weighted and unweighted average accuracy I couldn't access the confusion matrix. Because of svm.SVC.score only provides a ...
Farid J. Maleki's user avatar
1vote
1answer
408views

How to structure my data into features and targets for PCA on Big Data?

I want to apply the PCA algorithm from Scikit-Learn.(https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html ) At the part where I have to separate the features and the ...
Ariadne R.'s user avatar
1vote
1answer
327views

Python sklearn PCA transform function output does not match

I am computing PCA on some data using 10 components and using 3 out of 10 as: ...
shaifali Gupta's user avatar
3votes
2answers
1kviews

Why do we choose principal components based on maximum variance explained?

I've seen many people choose # of principal components for PCA based on maximum variance explained. So my question is do we always have to choose principal components based on maximum variance ...
Anjith's user avatar
0votes
1answer
256views

Guidance needed with dimension reduction for clustering - some numerical, lots of categorical data

I've my data in a Pandas df with 25.000 rows and 1.500 columns without any NaNs. Of the columns about 30 contain numerical data which I standardized with StandardScaler(). The rest are cols with ...
Zin Yosrim's user avatar
3votes
0answers
480views

PCA and FastICA in scikit-learn giving near identical results

So after importing my data, transforming it, and splitting into training and test sets I tried running this script for PCA: ...
Jon M's user avatar
2votes
3answers
15kviews

sklearn.decomposition.PCA explained_variance_ratio_ attribute does not exist

When trying to identify the variance explained by the first two columns of my dataset using the explained_variance_ratio_ attribute of ...
Lobke's user avatar
3votes
1answer
4kviews

ValueError: operands could not be broadcast together with shapes (60002,39) (38,) during pca.transform

I am trying to solve the San Francisco Crime Problem on Kaggle. To begin with, here is my code: ...
Prashant Pandey's user avatar
1vote
0answers
53views

PCA Reduction resulted in an elliptical form

I have a dataset with 19 features (columns). I normalized them using sklearn.preprocessing.normalize then I used PCA to reduce them to 2 components for plotting ...
Ahmedn1's user avatar
1vote
0answers
213views

predict rank from physical measurements with various lengths

I have physical measurements with length 2*n, where the first vector represents a charge or a capacity (in Coulomb) $C$ and the second one is a voltage $V$. Let's call this measurement "forming". A ...
hyamanieu's user avatar
0votes
1answer
14kviews

How to annotate labels in a 3D matplotlib scatter plot?

I have made a 3x3 PCA matrix with sklearn.decomposition PCA and plotted it to a matplotlib 3D scatter plot. How can I annotate labels near the points/marker? Here ...
user6131832's user avatar

153050per page
close